Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned From Fine-Tuned Embeddings

نویسندگان

چکیده

Social networks facilitate communication between people from all over the world. Unfortunately, excessive use of social leads to rise antisocial behaviors such as spread online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive detection become a crucial part cyberharassment. Manual cyberharassment is cumbersome, slow, not even feasible in rapidly growing data. In this study, we addressed challenges automatic tweets Arabic language. The main contribution study design implement an intelligent prediction system encompassing two-stage optimization approach identify classify non-offensive text. first stage, proposed fine-tuned pre-trained word embedding models by training them for several epochs on dataset. embeddings vocabularies new dataset are trained added old embeddings. While second it employed hybrid two classifiers, namely XGBoost SVM, genetic algorithm (GA) mitigate drawback classifiers finding optimal hyperparameter values run approach. We tested Cyberbullying Corpus (ArCybC), which contains collected four Twitter domains: gaming, sports, news, celebrities. ArCybC has categories: sexual, racial, intelligence, appearance. produced superior results, SVM with Aravec SkipGram model achieved accuracy rate 88.2% F1-score 87.8%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Offensive Language Detection Using Multi-level Classification

Text messaging through the Internet or cellular phones has become a major medium of personal and commercial communication. In the same time, flames (such as rants, taunts, and squalid phrases) are offensive/abusive phrases which might attack or offend the users for a variety of reasons. An automatic discriminative software with a sensitivity parameter for flame or abusive language detection wou...

متن کامل

Abusive Language Detection on Arabic Social Media

In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a n...

متن کامل

Overlapping Community Detection in Social Networks Based on Stochastic Simulation

Community detection is a task of fundamental importance in social network analysis. Community structures enable us to discover the hidden interactions among the network entities and summarize the network information that can be applied in many applied domains such as bioinformatics, finance, e-commerce and forensic science. There exist a variety of methods for community detection based on diffe...

متن کامل

Recognition of Arabic Sign Language Alphabet Using Polynomial Classifiers

Building an accurate automatic sign language recognition system is of great importance in facilitating efficient communication with deaf people. In this paper, we propose the use of polynomial classifiers as a classification engine for the recognition of Arabic sign language (ArSL) alphabet. Polynomial classifiers have several advantages over other classifiers in that they do not require iterat...

متن کامل

Mining Offensive Language on Social Media

English. The present research deals with the automatic annotation and classification of vulgar ad offensive speech on social media. In this paper we will test the effectiveness of the computational treatment of the taboo contents shared on the web, the output is a corpus of 31,749 Facebook comments which has been automatically annotated through a lexicon-based method for the automatic identific...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2022

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2022.3190960